5 research outputs found
Designing a Planetary-Scale IMAP Service with Conflict-free Replicated Data Types
Modern geo-replicated software serving millions of users across the globe faces the consequences of the CAP dilemma, i.e., the inevitable conflicts that arise when multiple nodes accept writes on shared state. The underlying problem is commonly known as fault-tolerant multi-leader replica- tion; actively researched in the distributed systems and database communities. As a more recent theoretical framework, Conflict-free Replicated Data Types (CRDTs) propose a solution to this problem by offering a set of always converging primitives. However, modeling non-trivial system state with CRDT primitives is a challenging and error-prone task. In this work, we propose a solution for a geo-replicated online service with fault-tolerant multi-leader replication based on CRDTs. We chose IMAP as use case due to its prevalence and simplicity. Therefore, we modeled an IMAP-CRDT and verified its correctness with the interactive theorem prover Isabelle/HOL. In order to bridge the gap between theory and practice, we implemented an open-source proto- type pluto and an IMAP benchmark for write-intensive workloads. We evaluated our prototype against the standard IMAP server Dovecot on a multi-continent public cloud. The results ex- pose the limitations of Dovecot with respect to response time performance and replication lag. Our prototype was able to leverage its conceptual advantages and outperformed Dovecot. We find that our approach is promising when facing the multitude of potential concurrency bugs in development of systems at planetary scale
Causality-Guided Adaptive Interventional Debugging
Runtime nondeterminism is a fact of life in modern database applications.
Previous research has shown that nondeterminism can cause applications to
intermittently crash, become unresponsive, or experience data corruption. We
propose Adaptive Interventional Debugging (AID) for debugging such intermittent
failures. AID combines existing statistical debugging, causal analysis, fault
injection, and group testing techniques in a novel way to (1) pinpoint the root
cause of an application's intermittent failure and (2) generate an explanation
of how the root cause triggers the failure. AID works by first identifying a
set of runtime behaviors (called predicates) that are strongly correlated to
the failure. It then utilizes temporal properties of the predicates to
(over)-approximate their causal relationships. Finally, it uses fault injection
to execute a sequence of interventions on the predicates and discover their
true causal relationships. This enables AID to identify the true root cause and
its causal relationship to the failure. We theoretically analyze how fast AID
can converge to the identification. We evaluate AID with six real-world
applications that intermittently fail under specific inputs. In each case, AID
was able to identify the root cause and explain how the root cause triggered
the failure, much faster than group testing and more precisely than statistical
debugging. We also evaluate AID with many synthetically generated applications
with known root causes and confirm that the benefits also hold for them.Comment: Technical report of AID (SIGMOD 2020
ALICE Electromagnetic Calorimeter Technical Design Report
The ALICE Electromagnetic Calorimeter technical design is reported